Search CORE

75 research outputs found

Fine-tuning on Clean Data for End-to-End Speech Translation: FBK @ IWSLT 2018

Author: Cattoni Roldano
Dessì Roberto
Di Gangi Mattia Antonino
Negri Matteo
Turchi Marco
Publication venue
Publication date: 01/01/2018
Field of study

This paper describes FBK's submission to the end-to-end English-German speech translation task at IWSLT 2018. Our system relies on a state-of-the-art model based on LSTMs and CNNs, where the CNNs are used to reduce the temporal dimension of the audio input, which is in general much higher than machine translation input. Our model was trained only on the audio-to-text parallel data released for the task, and fine-tuned on cleaned subsets of the original training corpus. The addition of weight normalization and label smoothing improved the baseline system by 1.0 BLEU point on our validation set. The final submission also featured checkpoint averaging within a training run and ensemble decoding of models trained during multiple runs. On test data, our best single model obtained a BLEU score of 9.7, while the ensemble obtained a BLEU score of 10.24.Comment: 6 pages, 2 figures, system description at the 15th International Workshop on Spoken Language Translation (IWSLT) 201

arXiv.org e-Print Archive

Archivio della ricerca - Fondazione Bruno Kessler

The ITC-irst statistical machine translation system for IWSLT-2004

Author: Marcello Federico
Mauro Cettolo
Nicola Bertoldi
Roldano Cattoni
Publication venue
Publication date
Field of study

Focus of this paper is the system for statistical machine translation developed at ITC-irst. It has been employed in the evaluation campaign of the International Workshop on Spoken Language Translation 2004 in all the three data set conditions of the Chinese-English track. Both the statistical model underlying the system and the system architecture are presented. Moreover, details are given on how the submitted runs have been produced. 1

CiteSeerX

Archivio della ricerca - Fondazione Bruno Kessler

A Multi-Perspective Evaluation of the NESPOLE! Speech-to-Speech Translation System

Author: Cattoni Roldano
Constantini Erica
Lavie Alon
Metze Florian
Publication venue
Publication date: 13/06/2008
Field of study

KITopen

Enhancing Transformer for End-to-end Speech-to-Text Translation

Author: Cattoni Roldano
Dessi Roberto
Di Gangi Mattia Antonino
Negri Matteo
Turchi Marco
Publication venue: European Association for Machine Translation
Publication date
Field of study

Neural end-to-end architectures have beenrecently proposed for spoken languagetranslation (SLT), following the state-of-the-art results obtained in machine translation (MT) and speech recognition (ASR).Motivated by this contiguity, we proposean SLT adaptation of Transformer (thestate-of-the-art architecture in MT), whichexploits the integration of ASR solutionsto cope with long input sequences featuring low information density. Long audiorepresentations hinder the training of largemodels due to Transformer’s quadraticmemory complexity.Moreover, for thesake of translation quality, handling suchsequences requires capturing both short-and long-range dependencies between bi-dimensional features. Focusing on Trans-former’s encoder, our adaptation is basedon:i)downsampling the input with con-volutional neural networks, which enablesmodel training on non cutting-edge GPUs,ii)modeling the bidimensional nature ofthe audio spectrogram with 2D components, andiii)adding a distance penaltyto the attention, which is able to bias ittowards short-range dependencies.Ourexperiments show that our SLT-adaptedTransformer outperforms the RNN-basedbaseline both in translation quality andtraining time, setting the state-of-the-artperformance on six language directions

Archivio della ricerca - Fondazione Bruno Kessler

Gender in Danger? Evaluating Speech Translation Technology on the MuST-SHE Corpus

Author: Bentivogli Luisa
Cattoni Roldano
Di Gangi Mattia Antonino
Negri Matteo
Savoldi Beatrice
Turchi Marco
Publication venue
Publication date: 10/06/2020
Field of study

Translating from languages without productive grammatical gender like English into gender-marked languages is a well-known difficulty for machines. This difficulty is also due to the fact that the training data on which models are built typically reflect the asymmetries of natural languages, gender bias included. Exclusively fed with textual data, machine translation is intrinsically constrained by the fact that the input sentence does not always contain clues about the gender identity of the referred human entities. But what happens with speech translation, where the input is an audio signal? Can audio provide additional information to reduce gender bias? We present the first thorough investigation of gender bias in speech translation, contributing with: i) the release of a benchmark useful for future studies, and ii) the comparison of different technologies (cascade and end-to-end) on two language directions (English-Italian/French).Comment: 9 pages of content, accepted at ACL 202

arXiv.org e-Print Archive

Archivio della ricerca - Fondazione Bruno Kessler

The Multilingual TEDx Corpus for Speech Recognition and Translation

Author: Douglas W. Oard
Elizabeth Salesky
Jacob Bremerman
Marco Turchi
Matt Post
Matteo Negri
Matthew Wiesner
Roldano Cattoni
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2021
Field of study

We present the Multilingual TEDx corpus, built to support speech recognition (ASR) and speech translation (ST) research across many non-English source languages. The corpus is a collection of audio recordings from TEDx talks in 8 source languages. We segment transcripts into sentences and align them to the source-language audio and target-language translations. The corpus is released along with open-sourced code enabling extension to new talks and languages as they become available. Our corpus creation methodology can be applied to more languages than previous work, and creates multi-way parallel evaluation sets. We provide baselines in multiple ASR and ST settings, including multilingual models to improve translation performance for low-resource language pairs

arXiv.org e-Print Archive

Archivio della ricerca - Fondazione Bruno Kessler

The IWSLT 2016 Evaluation Campaign

Author: Jan Niehues
Luisa Bentivogli
Marcello Federico
Mauro Cettolo
Roldano Cattoni
Sebastian Stüker
Publication venue
Publication date: 08/12/2016
Field of study

The IWSLT 2016 Evaluation Campaign featured two tasks: the translation of talks and the translation of video conference conversations. While the first task extends previously offered tasks with talks from a different source, the second task is completely new. For both tasks, three tracks were organised: automatic speech recognition (ASR), spoken language translation (SLT), and machine translation (MT). Main translation directions that were offered are English to/from German and English to French. Additionally, the MT track included English to/from Arabic and Czech, as well as French to English. We received this year run submissions from 11 research labs. All runs were evaluated with objective metrics, while submissions for two of the MT talk tasks were also evaluated with human post-editing. Results of the human evaluation show improvements over the best submissions of last year

Archivio della ricerca - Fondazione Bruno Kessler

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

NottReal: A Tool for Voice-based Wizard of Oz studies

Author: Browne Jacob T
Dow Steven P
Malhotra Ashok
Martelaro Nikolas
Mauro Cettolo Antoniol Roldano Cattoni
Porcheron Martin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

Crossref

Cronfa at Swansea University

The IWSLT 2018 Evaluation Campaign

Author: Cattoni Roldano
Cettolo Mauro
Federico Marcello
Niehues Jan
Stüker Sebastian
Turchi Marco
Publication venue
Publication date
Field of study

The InternationalWorkshop of Spoken Language Translation (IWSLT) 2018 Evaluation Campaign featured two tasks: the low-resourced machine translation task and the speech translation task. In the first task, manual transcribed speech needs to be translated from Basque to English. Since this translation direction is a under-resourced language pair, participants were encouraged to used additional parallel data from related languages. In the second task, the participants need to translate English audio into German text by building a full speech-translation system. In the baseline condition, participants were free to used any architecture, while they are restricted to use a single model for the end-to-end task. This year, eight research groups took part in the Basque English translation task, and nine in the speech translation tas

Archivio della ricerca - Fondazione Bruno Kessler

The NESPOLE! speech-to-speech translation system

Author: Cattoni Roldano
Langley Chad
Lavie Alon
Lazzari Gianni
McDonough John
Metze Florian
Pianesi Fabio
Schultz Tanja
Soltau Hagen
Waibel Alex
Publication venue
Publication date: 13/06/2008
Field of study

KITopen